Non-linear Mapping for Improved Identification of 1300+ Languages

نویسنده

Ralf D. Brown

چکیده

Non-linear mappings of the form P (ngram)γ and log(1+τP (ngram)) log(1+τ) are applied to the n-gram probabilities in five trainable open-source language identifiers. The first mapping reduces classification errors by 4.0% to 83.9% over a test set of more than one million 65-character strings in 1366 languages, and by 2.6% to 76.7% over a subset of 781 languages. The second mapping improves four of the five identifiers by 10.6% to 83.8% on the larger corpus and 14.4% to 76.7% on the smaller corpus. The subset corpus and the modified programs are made freely available for download at http://www.cs.cmu.edu/∼ralf/langid.html.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Concept Mapping on Iranian EFL Learners’ Vocabulary Learning and Strategy Use

This study aimed to investigate the effects of concept mapping on the extent to which Iranian EFL learners retain new vocabularies and the degree of awareness toward vocabulary learning strategies they tended to use. To this end, a total of 40 Iranian EFL students were asked to participate in this study. They were randomly assigned to two equal groups; namely, experimental and control. The part...

متن کامل

A comparative study of quantitative mapping methods for bias correction of ERA5 reanalysis precipitation data

This study evaluates the ability of different quantitative mapping (QM) methods as a bias correction technique for ERA5 reanalysis precipitation data. Climate type and geographical location can affect the performance of the bias correction method due to differences in precipitation characteristics. For this purpose, ERA5 reanalysis precipitation data for the years 1989-2019 for 10 selected syno...

متن کامل

Collocational Processing in Two Languages: A psycholinguistic comparison of monolinguals and bilinguals

With the renewed interest in the field of second language learning for the knowledge of collocating words, research findings in favour of holistic processing of formulaic language could support the idea that these language units facilitate efficient language processing. This study investigated the difference between processing of a first language (L1) and a second language (L2) of congruent col...

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Non-linear Mapping for Improved Identification of 1300+ Languages

نویسنده

چکیده

منابع مشابه

The Effect of Concept Mapping on Iranian EFL Learners’ Vocabulary Learning and Strategy Use

A comparative study of quantitative mapping methods for bias correction of ERA5 reanalysis precipitation data

Collocational Processing in Two Languages: A psycholinguistic comparison of monolinguals and bilinguals

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

عنوان ژورنال:

اشتراک گذاری